Search CORE

12 research outputs found

A Pāniniān Framework for Analyzing Case Marker Errors in English-Urdu Machine Translation

Author: Behera Pitambar
Jha Girish Nath
Muzaffar Sharmin
Publication venue: The Author(s). Published by Elsevier B.V.
Publication date: 31/12/2016
Field of study

AbstractPanini's Kāraka Theory is solely based on the syntactico-semantic approach to understanding a natural language which takes into consideration the arguments of the verbs. It provides a framework for exhibiting the syntactic relations among constituents in terms of modifier-modified and semantic relations with respect to Kāraka-Vibhakt̪i (semantic role and postposition).In this paper, it has been argued that Pāniniān Dependency Framework can be considered to deal with the MT errors with special reference to case. Firstly, a corpus of approximately 500 English sentences as input have been provided to Google and Bing online MT platforms. Thereafter, all the output sentences in Urdu have been collated in bulk. Thirdly, all the sentences have been evaluated and errors pertaining to case have been categorized based on the Gold Standard. Finally, Pāniniān dependency framework has been proposed for addressing the case-related errors for Indian languages

Elsevier - Publisher Connector

Anaphors in Sanskrit

Author: Diwakar Mishra
Jha Girish Nath
Pravin Pralayankar
Sobha L.
Surjit Kumar Singh
Publication venue
Publication date: 29/10/2008
Field of study

Proceedings of the Second Workshop on Anaphora Resolution (WAR II). Editor: Christer Johansson. NEALT Proceedings Series, Vol. 2 (2008), 11-25. © 2008 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/7129

DSpace at Tartu University Library

Revisiting Low Resource Status of Indian Languages in Machine Translation

Author: Arora Sanjeev
Barrault Loïc
Bañón Marta
Dabre Raj
Goyal Vikrant
Jha Girish Nath
Koehn Philipp
Kudo Taku
Kunchukuttan Anoop
Nakazawa Toshiaki
Nakazawa Toshiaki
Nakazawa Toshiaki
Papineni Kishore
Parida Shantipriya
Post Matt
Ramasamy Loganathan
Rudrabha Mukhopadhyay Prajwal KR
Schwenk Holger
Sennrich Rico
Sennrich Rico
Siripragada Shashank
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/11/2020
Field of study

Indian language machine translation performance is hampered due to the lack of large scale multi-lingual sentence aligned corpora and robust benchmarks. Through this paper, we provide and analyse an automated framework to obtain such a corpus for Indian language neural machine translation (NMT) systems. Our pipeline consists of a baseline NMT system, a retrieval module, and an alignment module that is used to work with publicly available websites such as press releases by the government. The main contribution towards this effort is to obtain an incremental method that uses the above pipeline to iteratively improve the size of the corpus as well as improve each of the components of our system. Through our work, we also evaluate the design choices such as the choice of pivoting language and the effect of iterative incremental increase in corpus size. Our work in addition to providing an automated framework also results in generating a relatively larger corpus as compared to existing corpora that are available for Indian languages. This corpus helps us obtain substantially improved results on the publicly available WAT evaluation benchmark and other standard evaluation benchmarks.Comment: 10 pages, few figures, Preprint under revie

arXiv.org e-Print Archive

Crossref

Error Analysis of SaHiT - A Statistical Sanskrit-Hindi Translator

Author: Jha Girish Nath
Pandey Rajneesh Kumar
Publication venue: Published by Elsevier B.V.
Publication date: 31/12/2016
Field of study

AbstractThe paper shows a statistical Sanskrit-Hindi Translator and analyzes the errors being generated by the system. The System is being trained simultaneously on the platform - the Microsoft Translator Hub (MTHub) and is intended only for simple Sanskrit prose texts. The training set includes 24K parallel sentences and 25k monolingual data with recent BLEU (Bilingual Evaluation Understudy) scores of 41 and above. The paper discusses the errors analysis of the system and suggests possible solutions. Further, it also focuses on the evaluation of MTHub system with BLEU metrics. For developing MT systems, the parallel Sanskrit-Hindi text corpora has been collected or developed manually from the literature, health, news and tourism domains. The paper also discusses issues and challenges in the development of translation systems for languages like Sanskrit

Elsevier - Publisher Connector